26 research outputs found
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation
We present a methodology that explores how sentence structure is reflected in
neural representations of machine translation systems. We demonstrate our
model-agnostic approach with the Transformer English-German translation model.
We analyze neuron-level correlation of activations between paraphrases while
discussing the methodology challenges and the need for confound analysis to
isolate the effects of shallow cues. We find that similarity between activation
patterns can be mostly accounted for by similarity in word choice and sentence
length. Following that, we manipulate neuron activations to control the
syntactic form of the output. We show this intervention to be somewhat
successful, indicating that deep models capture sentence-structure
distinctions, despite finding no such indication at the neuron level. To
conduct our experiments, we develop a semi-automatic method to generate
meaning-preserving minimal pair paraphrases (active-passive voice and adverbial
clause-noun phrase) and compile a corpus of such pairs
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
Model fusion research aims to aggregate the knowledge of multiple models to
enhance performance by combining their weights. In this work, we study the
inverse, investigating whether and how can model fusion interfere and reduce
unwanted knowledge. We delve into the effects of model fusion on the evolution
of learned shortcuts, social biases, and memorization capabilities in
fine-tuned language models. Through several experiments covering text
classification and generation tasks, our analysis highlights that shared
knowledge among models is usually enhanced during model fusion, while unshared
knowledge is usually lost or forgotten. Based on this observation, we
demonstrate the potential of model fusion as a debiasing tool and showcase its
efficacy in addressing privacy concerns associated with language models.Comment: 16 pages, 9 figures, 6 table